† Corresponding author. E-mail:
Directed networks such as gene regulation networks and neural networks are connected by arcs (directed links). The nodes in a directed network are often strongly interwound by a huge number of directed cycles, which leads to complex information-processing dynamics in the network and makes it highly challenging to infer the intrinsic direction of information flow. In this theoretical paper, based on the principle of minimum-feedback, we explore the node hierarchy of directed networks and distinguish feedforward and feedback arcs. Nearly optimal node hierarchy solutions, which minimize the number of feedback arcs from lower-level nodes to higher-level nodes, are constructed by belief-propagation and simulated-annealing methods. For real-world networks, we quantify the extent of feedback scarcity by comparison with the ensemble of direction-randomized networks and identify the most important feedback arcs. Our methods are also useful for visualizing directed networks.
Directed networks are formed by nodes and arcs (i.e., directed links) pointing from one node to another. They are ubiquitous in biological and technological systems; for instance, neurons in the brain rely on directed synaptic connections to form an information-processing network,[1] and cell regulatory networks contain directed interactions between genes, proteins, and other small molecules.[2–4] Structural properties of directed networks at different scales have been studied in the literature for many years, especially on network small motifs,[4,5] mesoscopic communities,[6,7] strongly connected components,[8,9] and network hierarchical structures.[10–14] A directed network can easily be decomposed into a set of strongly connected components (SCCs). If one represents each SCC by a single vertex and neglects its internal connections, then at this coarse-grained level the directed network is simply a directed acyclic graph and there is no directed cycles among the SCCs.[9,13] Each SCC is itself a maximal subnetwork formed by some nodes and the arcs between them, and any node can reach and be reached by any other node of the same SCC through at least one directed path. Directed cycles are usually abundant in the large SCCs (each of which contains many nodes and arcs), and they cause strong feedback effect and make the information-processing dynamics in the network highly complex.[15,16]
The hierarchical structures within large SCCs of directed networks have not yet been fully investigated except for a few earlier efforts (e.g., Refs. [12], [14], and [16]–[18]). Due to the cyclic nature of a SCC, it appears at first sight to be quite ambiguous or even meaningless to order its nodes in a particular way and to define an intrinsic flow direction.[13] However in this paper we show that the arcs that are most vital for feedback interactions can be identified by collectively considering all the directed cycles of the network. We take an optimization approach based on the so-called principle of minimum feedback,[12] which defines the minimum feedback arc set problem. An integer hierarchical level is assigned to each node of the input network and the resulting level configuration of all the nodes is called a node hierarchy. The node levels in this hierarchy are optimized by two efficient physics-inspired algorithms, SA and BPD, which minimize the total number of feedback arcs (defined as those pointing from lower-level nodes to higher-level nodes).
Given a real-world directed network, we can construct many near-minimum feedback arc sets by repeatedly running the SA or BPD algorithm. The sizes of these constructed sets are very close to each other and are much smaller than the total number of arcs in the network. We also find that, while most of the arcs of the network never appear in any of these feedback arc sets, a few of them appear in almost all of them. As a concrete example, for the Florida food web[19] formed by 128 nodes and 2106 arcs, only six of the arcs need to be classified as feedbacks, (Fig.
By distinguishing feedforward arcs and feedback arcs for a real-world directed network, our work help to reveal the hidden principal direction of flows in the network and the hierarchical organization of the nodes within the strongly connected network components. For biological networks, the identified most important feedback arcs might serve as optimal targets of intervening the system.[20] Our algorithms are also useful for network visualization.[21] The source codes of these algorithms will be publicly available to facilitate analyzing and visualizing biological, technological, and social networks.
The minimum feedback principle serves as a useful framework for understanding the structural properties of directed networks. Since feedback interactions interfere with the default dynamics of a real-world complex system, it should be desirable to delete the redundant feedbacks to increase efficency and decrease cost. We therefore expect the minimum feedback principle to be evolutionarily plausible for real-world complex networks. Even if the feedback minimization is not a major driving force for network evolution and adaptation, the technical results of this work still hold.
Given a directed network G of N nodes and M arcs, with arc density
The node hierarchy problem is essentially equivalent to the feedback arc set problem, a fundamental and famous non-deterministic polynomial hard (NP-hard) problem in computer science[22] (Appendix
We can treat the node hierarchy problem as a statistical mechanical system. Let us define the energy of an arc
We write down the following equilibrium partition function Z to combine the effects of energy and level constraints:
Here ψi is the Boltzmann factor of node i due to its level constraint:
We have solved model (
Fortunately, the equivalence between the optimal node hierarchy problem and the minimum FAS problem means that we can obtain a near-optimal node hierarchy by first constructing a near-minimum FAS. For the latter task, the level constraints of Eq. (
We can iterate the BP equation (
Based on Eqs. (
Let us represent an N-node permutation as a column vector
We first test the BPD and SA algorithms on directed Erdös–Rényi (ER), directed regular random (RR) and directed scale-free (SF) random networks.[32,33] Both ER and RR networks are homogenous, while SF networks are quite heterogeneous in that some nodes have a lot of attached arcs (Appendix
We find that BPD and SA perform almost equally good on all the heterogeneous (SF) random networks and on the homogeneous (ER and RR) networks of arc density
The SA algorithm slightly outperforms BPD for directed ER and RR networks of arc density
In terms of computation time, the BPD and SA algorithms both are efficient on relatively small random directed network instances. They usually reach satisfactory solutions within one or two hours when applied on networks of size
As a demonstration of practical applications, we now apply BPD and SA on a small set of representative real-world directed networks (Table
For these real-world network instances, we again find that the feedback arc sets contructed by BPD and SA are of very similar sizes. It is very likely that near-optimal FAS solutions have been achieved by these two algorithms. The SA algorithm and BPD perform equally good on the four small network instances, but SA slightly outperforms BPD on the three large network instances (Metabolic, Wiki-Vote, P2P-share). We list in Table
For each examined real-world network we also generate 96 replicas with the same connectivity pattern but completely randomized directions of all the simple arcs, and apply SA on them to obtain the expected number
This quantity has a clear statistical meaning. A large positive value of R suggests that the number
As Table
When we repeatedly run the SA or BPD algorithm on the same real-world network instance, we find that the output feedback arc sets are usually not identical although their sizes are almost equal. Most importantly, we find that most of the arcs in the network never appear in any of these constructed feedback arc sets, but some arcs are present in almost all these sets (Fig.
Figure
After the feedback probability
In this paper, we introduced the optimal node hierarchy problem, which is essentially equivalent to the minimum feedback arc set problem, and presented two physics-based algorithms to efficiently solve this problem for random and real-world directed networks. Our BPD and SA algorithms are capable of revealing the hidden hierarchical structure and the principal flow direction of a real-world directed network. Our methods can also be used to discover a small number of arcs which are involved most significantly in feedback interactions. We found that feedback interactions are extremely supressed in some real-world networks.
The methods of this work may have wide practical applications in studies of biological, technological, and social networks and in network engineering. For example, after the intrinsic flow direction in the network has been determined, it may become much more easier to design efficient arc-deletion or arc-addition strategies to improve the functionality of the network and to make it more robust against random failures or intentional attacks. The key feedback arcs identified by our algorithms may serve as optimal targets of intervening the dynamical processes on the network.
A natural extension of the present work is to consider optimal ways of cutting long directed cycles to dismantle a directed network. Similar to the proposal of optimally dismantling an undirected network,[37,38] we may iteratively delete the arcs that are predicted to be most important for long-range feedback interactions to break the original directed network down into many small strongly connected components. Detailed numerical study on this important network optimization problem will be reported in a separate paper.
Directed cycles are large-scale structural aspects of a directed network. They cause complicated global constraints to the node hierarchy and FAS problems. Further efforts are needed to improve the theoretical models and the BPD algorithm of this paper. Indeed the two spin glass models of the present paper still have major shortcomings. Firstly, each node i of the network can take many different level states hi, which considerably slows down the numerical computation. Secondly, the predicted minimum cardinalities of feedback arc sets by the two models differ noticably with each other and with the algorithmic results of BPD and SA. Thirdly, the associated BPD algorithms of the two models perform worse than the SA algorithm on homogeneous random networks of relative large arc densities. We hope these issues will be overcome in the near future by a refined statistical physics model of the minimum feedback arc set problem.
[1] | |
[2] | |
[3] | |
[4] | |
[5] | |
[6] | |
[7] | |
[8] | |
[9] | |
[10] | |
[11] | |
[12] | |
[13] | |
[14] | |
[15] | |
[16] | |
[17] | |
[18] | |
[19] | |
[20] | |
[21] | |
[22] | |
[23] | |
[24] | |
[25] | |
[26] | |
[27] | |
[28] | |
[29] | |
[30] | |
[31] | |
[32] | |
[33] | |
[34] | |
[35] | |
[36] | |
[37] | |
[38] |